Search CORE

XcisClique: analysis of regulatory bicliques

Author: Grene Ruth
Heath Lenwood S
Murali TM
Pati Amrita
Vasquez-Robinet Cecilia
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Modeling of cis-elements or regulatory motifs in promoter (upstream) regions of genes is a challenging computational problem. In this work, set of regulatory motifs simultaneously present in the promoters of a set of genes is modeled as a biclique in a suitably defined bipartite graph. A biologically meaningful co-occurrence of multiple cis-elements in a gene promoter is assessed by the combined analysis of genomic and gene expression data. Greater statistical significance is associated with a set of genes that shares a common set of regulatory motifs, while simultaneously exhibiting highly correlated gene expression under given experimental conditions. METHODS: XcisClique, the system developed in this work, is a comprehensive infrastructure that associates annotated genome and gene expression data, models known cis-elements as regular expressions, identifies maximal bicliques in a bipartite gene-motif graph; and ranks bicliques based on their computed statistical significance. Significance is a function of the probability of occurrence of those motifs in a biclique (a hypergeometric distribution), and on the new sum of absolute values statistic (SAV) that uses Spearman correlations of gene expression vectors. SAV is a statistic well-suited for this purpose as described in the discussion. RESULTS: XcisClique identifies new motif and gene combinations that might indicate as yet unidentified involvement of sets of genes in biological functions and processes. It currently supports Arabidopsis thaliana and can be adapted to other organisms, assuming the existence of annotated genomic sequences, suitable gene expression data, and identified regulatory motifs. A subset of Xcis Clique functionalities, including the motif visualization component MotifSee, source code, and supplementary material are available at

University of Toronto Research Repository

Predicting protein functions by relaxation labelling protein interaction network

Author: AD Marshall
AL Barabasi
Andrew Emili
B Schwikowski
D Lin
E Nabieva
H Lee
HN Chua
Hui Jiang
J McDermott
JZ Wang
M Ashburner
M Riley
P Hu
P Resnik
Pingzhao Hu
PW Lord
R Jansen
R Jansen
TM Murali
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background One of key issues in the post-genomic era is to assign functions to uncharacterized proteins. Since proteins seldom act alone; rather, they must interact with other biomolecular units to execute their functions. Thus, the functions of unknown proteins may be discovered through studying their interactions with proteins having known functions. Although many approaches have been developed for this purpose, one of main limitations in most of these methods is that the dependence among functional terms has not been taken into account. Results We developed a new network-based protein function prediction method which combines the likelihood scores of local classifiers with a relaxation labelling technique. The framework can incorporate the inter-relationship among functional labels into the function prediction procedure and allow us to efficiently discover relevant non-local dependence. We evaluated the performance of the new method with one other representative network-based function prediction method using E. coli protein functional association networks. Conclusion Our results showed that the new method has better prediction performance than the previous method. The better predictive power of our method gives new insights about the importance of the dependence between functional terms in protein functional prediction.</p

DeBi: Discovering Differentially Expressed Biclusters using a Frequent Itemset Approach

Author: A Ben-Dor
A Prelic
A Rosenwald
A Tanay
AD Basehoar
Akdes Serin
B Andreopoulos
BKH Chia
CT Harbison
D Burdick
DR Ciocca
G Li
GA Grothaus
J Lamb
JA Hartigan
JA Hartigan
JL Jensen
JN Keller
KD MacIsaac
Martin Vingron
R Shamir
RR Sokal
S Barkow
S Bergmann
S Hochreiter
SC Madeira
TM Murali
TR Hughes
XG Ni
Y Cheng
Y Hoshida
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The analysis of massive high throughput data via clustering algorithms is very important for elucidating gene functions in biological systems. However, traditional clustering methods have several drawbacks. Biclustering overcomes these limitations by grouping genes and samples simultaneously. It discovers subsets of genes that are co-expressed in certain samples. Recent studies showed that biclustering has a great potential in detecting marker genes that are associated with certain tissues or diseases. Several biclustering algorithms have been proposed. However, it is still a challenge to find biclusters that are significant based on biological validation measures. Besides that, there is a need for a biclustering algorithm that is capable of analyzing very large datasets in reasonable time. Results Here we present a fast biclustering algorithm called DeBi (Differentially Expressed BIclusters). The algorithm is based on a well known data mining approach called frequent itemset. It discovers maximum size homogeneous biclusters in which each gene is strongly associated with a subset of samples. We evaluate the performance of DeBi on a yeast dataset, on synthetic datasets and on human datasets. Conclusions We demonstrate that the DeBi algorithm provides functionally more coherent gene sets compared to standard clustering or biclustering algorithms using biological validation measures such as Gene Ontology term and Transcription Factor Binding Site enrichment. We show that DeBi is a computationally efficient and powerful tool in analyzing large datasets. The method is also applicable on multiple gene expression datasets coming from different labs or platforms.</p

MPG.PuRe

A polynomial time biclustering algorithm for finding approximate expression patterns in gene expression time series

Author: A Ben-Dor
A Prelic
A Tanay
AP Gasch
Arlindo L Oliveira
C Wu
D Gusfield
D Martin
E Yang
GJ McLachlan
IP Androulakis
IV Mechelen
J Liu
J Liu
J Liu
J Liu
L Ji
L Ji
M Koyuturk
MC Teixeira
MF Sagot
Q Sheng
R Peeters
S Lonardi
Sara C Madeira
SC Madeira
SC Madeira
SC Madeira
SC Madeira
SC Madeira
TM Murali
Y Cheng
Y Zhang
Z Bar-Joseph
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background The ability to monitor the change in expression patterns over time, and to observe the emergence of coherent temporal responses using gene expression time series, obtained from microarray experiments, is critical to advance our understanding of complex biological processes. In this context, biclustering algorithms have been recognized as an important tool for the discovery of local expression patterns, which are crucial to unravel potential regulatory mechanisms. Although most formulations of the biclustering problem are NP-hard, when working with time series expression data the interesting biclusters can be restricted to those with contiguous columns. This restriction leads to a tractable problem and enables the design of efficient biclustering algorithms able to identify all maximal contiguous column coherent biclusters. Methods In this work, we propose <it>e</it>-CCC-Biclustering, a biclustering algorithm that finds and reports all maximal contiguous column coherent biclusters with approximate expression patterns in time polynomial in the size of the time series gene expression matrix. This polynomial time complexity is achieved by manipulating a discretized version of the original matrix using efficient string processing techniques. We also propose extensions to deal with missing values, discover anticorrelated and scaled expression patterns, and different ways to compute the errors allowed in the expression patterns. We propose a scoring criterion combining the statistical significance of expression patterns with a similarity measure between overlapping biclusters. Results We present results in real data showing the effectiveness of <it>e</it>-CCC-Biclustering and its relevance in the discovery of regulatory modules describing the transcriptomic expression patterns occurring in <it>Saccharomyces cerevisiae </it>in response to heat stress. In particular, the results show the advantage of considering approximate patterns when compared to state of the art methods that require exact matching of gene expression time series. Discussion The identification of co-regulated genes, involved in specific biological processes, remains one of the main avenues open to researchers studying gene regulatory networks. The ability of the proposed methodology to efficiently identify sets of genes with similar expression patterns is shown to be instrumental in the discovery of relevant biological phenomena, leading to more convincing evidence of specific regulatory mechanisms. Availability A prototype implementation of the algorithm coded in Java together with the dataset and examples used in the paper is available in <url>http://kdbio.inesc-id.pt/software/e-ccc-biclustering</url>.</p

Construction of gene regulatory networks using biclustering and bayesian networks

Author: A Ben-Dor
A Faisal
A Prelic
A Tanay
AC Lozano
AP Gasch
C Wolfe
CT Ronald
D Jesse
D Reiss
F Azuaje
Fadhl M Alakwaa
FM Al-Akwaa
FM Alakwaa
G Bader
G Fung
G Stolovitzky
I Avila-Campillo
J Ihmels
KO Cheng
MD Dyer
N Friedman
Nahed H Solouma
O Troyanskaya
P D haeseleer
P D'haeseleer
P Shannon
Pe Dana
PTSG Spellman
R Bonneau
R Guthke
S Barkow
S Datta
S Kauffman
S Maere
S Tavazoie
SC Madeira
T Chen
TM Murali
X Liu
Xw Chen
Y Assenov
Y Cheng
Yasser M Kadah
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Understanding gene interactions in complex living systems can be seen as the ultimate goal of the systems biology revolution. Hence, to elucidate disease ontology fully and to reduce the cost of drug development, gene regulatory networks (GRNs) have to be constructed. During the last decade, many GRN inference algorithms based on genome-wide data have been developed to unravel the complexity of gene regulation. Time series transcriptomic data measured by genome-wide DNA microarrays are traditionally used for GRN modelling. One of the major problems with microarrays is that a dataset consists of relatively few time points with respect to the large number of genes. Dimensionality is one of the interesting problems in GRN modelling. Results In this paper, we develop a biclustering function enrichment analysis toolbox (BicAT-plus) to study the effect of biclustering in reducing data dimensions. The network generated from our system was validated via available interaction databases and was compared with previous methods. The results revealed the performance of our proposed method. Conclusions Because of the sparse nature of GRNs, the results of biclustering techniques differ significantly from those of previous methods.</p

Cape Town University OpenUCT

Scoring Protein Relationships in Functional Interaction Networks Predicted from Sequence Data

Author: A Vazquez
B Schwikowski
C von Mering
C von Mering
CE Shannon
Christophe Herman
CL Myers
D Devos
E Nabieva
G Subramanian
Gaston K. Mazandu
GRG Lanckriet
HN Chua
HN Chua
HN Chua
J Krawczyk
J Xiong
JCD Mackay
K Raman
K Tsuda
LJ Jensen
M Deng
M Deng
M Li
MA Mahdavi
Nicola J. Mulder
NJ Mulder
NJ Mulder
O Bastian
O Bastian
OG Troyanskaya
P Baldi
PG Aaron
RVL Hartley
S Hunter
S Letovsky
S Yellaboina
SF Altschul
SF Altschul
SF Altschul
TM Murali
WR Pearson
X Mao
Y Chen
Y-R Cho
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

The abundance of diverse biological data from various sources constitutes a rich source of knowledge, which has the power to advance our understanding of organisms. This requires computational methods in order to integrate and exploit these data effectively and elucidate local and genome wide functional connections between protein pairs, thus enabling functional inferences for uncharacterized proteins. These biological data are primarily in the form of sequences, which determine functions, although functional properties of a protein can often be predicted from just the domains it contains. Thus, protein sequences and domains can be used to predict protein pair-wise functional relationships, and thus contribute to the function prediction process of uncharacterized proteins in order to ensure that knowledge is gained from sequencing efforts. In this work, we introduce information-theoretic based approaches to score protein-protein functional interaction pairs predicted from protein sequence similarity and conserved protein signature matches. The proposed schemes are effective for data-driven scoring of connections between protein pairs. We applied these schemes to the Mycobacterium tuberculosis proteome to produce a homology-based functional network of the organism with a high confidence and coverage. We use the network for predicting functions of uncharacterised proteins

CiteSeerX

Public Library of Science (PLOS)

Network-Based Prediction and Analysis of HIV Dependency Factors

HIV Dependency Factors (HDFs) are a class of human proteins that are essential for HIV replication, but are not lethal to the host cell when silenced. Three previous genome-wide RNAi experiments identified HDF sets with little overlap. We combine data from these three studies with a human protein interaction network to predict new HDFs, using an intuitive algorithm called SinkSource and four other algorithms published in the literature. Our algorithm achieves high precision and recall upon cross validation, as do the other methods. A number of HDFs that we predict are known to interact with HIV proteins. They belong to multiple protein complexes and biological processes that are known to be manipulated by HIV. We also demonstrate that many predicted HDF genes show significantly different programs of expression in early response to SIV infection in two non-human primate species that differ in AIDS progression. Our results suggest that many HDFs are yet to be discovered and that they have potential value as prognostic markers to determine pathological outcome and the likelihood of AIDS development. More generally, if multiple genome-wide gene-level studies have been performed at independent labs to study the same biological system or phenomenon, our methodology is applicable to interpret these studies simultaneously in the context of molecular interaction networks and to ask if they reinforce or contradict each other

Public Library of Science (PLOS)

Discovering Networks of Perturbed Biological Processes in Hepatocyte Cultures

The liver plays a vital role in glucose homeostasis, the synthesis of bile acids and the detoxification of foreign substances. Liver culture systems are widely used to test adverse effects of drugs and environmental toxicants. The two most prevalent liver culture systems are hepatocyte monolayers (HMs) and collagen sandwiches (CS). Despite their wide use, comprehensive transcriptional programs and interaction networks in these culture systems have not been systematically investigated. We integrated an existing temporal transcriptional dataset for HM and CS cultures of rat hepatocytes with a functional interaction network of rat genes. We aimed to exploit the functional interactions to identify statistically significant linkages between perturbed biological processes. To this end, we developed a novel approach to compute Contextual Biological Process Linkage Networks (CBPLNs). CBPLNs revealed numerous meaningful connections between different biological processes and gene sets, which we were successful in interpreting within the context of liver metabolism. Multiple phenomena captured by CBPLNs at the process level such as regulation, downstream effects, and feedback loops have well described counterparts at the gene and protein level. CBPLNs reveal high-level linkages between pathways and processes, making the identification of important biological trends more tractable than through interactions between individual genes and molecules alone. Our approach may provide a new route to explore, analyze, and understand cellular responses to internal and external cues within the context of the intricate networks of molecular interactions that control cellular behavior

Reduced cortical thickness in patients with acute-on-chronic liver failure due to non-alcoholic etiology

Author: A Duseja
A Verma
AH Lockwood
AM Dale
B Ahl
BD Ross
C Hermenegildo
C Montoliu
D Amarapurkar
D Krieger
D Shawcross
DL Shawcross
E Matsusue
Ena Wang
F Miese
Francesco M. Marincola
H Maeda
J Albrecht
J Bustamante
J Cordoba
JJ Kril
JR Chen
JS Bajaj
K Nath
KV Rao
L Spahr
M Guevara
M Iwasa
M Odeh
M Reuter
M Romero-Gomez
M Skowronska
MD Leise
Michael A. Thomas
MJ Frank
ML Zeneroli
Mohammad Haris
Murali Rangan
O Cauli
P Ferenci
P Kumar Mandal
R Butterworth
R Jalan
R Kumar
R Rodrigo
Rakesh K. Gupta
RE Tarter
RF Butterworth
RK Gupta
S Umapathy
Santosh K. Yadav
Sergio Rutella
Silvio Danese
SK Sarin
SW Provencher
TM Rudkin
V Arroyo
V Dror
Vivek A. Saraswat
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 06/10/2015
Field of study

Background: Acute-on-chronic liver failure (ACLF) is a form of liver disease with high short-term mortality. ACLF offers considerable potential to affect the cortical areas by significant tissue injury due to loss of neurons and other supporting cells. We measured changes in cortical thickness and metabolites profile in ACLF patients following treatment, and compared it with those of age matched healthy volunteers. Methods: For the cortical thickness analysis we performed whole brain high resolution T1-weighted magnetic resonance imaging (MRI) on 15 ACLF and 10 healthy volunteers at 3T clinical MR scanner. Proton MR Spectroscopy (1H MRS) was also performed to measure level of altered metabolites. Out of 15 ACLF patients 10 survived and underwent follow-up study after clinical recovery at 3 weeks. FreeSurfer program was used to quantify cortical thickness and LC- Model software was used to quantify absolute metabolites concentrations. Neuropsychological (NP) test was performed to assess the cognitive performance in follow-up ACLF patients compared to controls. Results: Significantly reduced cortical thicknesses in multiple brain sites, and significantly decreased N-acetyl aspartate (NAA), myo-inositol (mI) and significantly increased glutamate/glutamine (glx) metabolites were observed in ACLF compared to those of controls at baseline study. Follow-up patients showed significant recovery in cortical thickness and Glx level, while NAA and mI were partially recovered compared to baseline study. When compared to controls, follow-up patients still showed reduced cortical thickness and altered metabolites level. Follow-up patients had abnormal neuropsychological (NP) scores compared to controls. Conclusions: Neuronal loss as suggested by the reduced NAA, decreased cellular density due to increased cerebral hyperammonemia as supported by the increased glx level, and increased proinflammatory cytokines and free radicals may account for the reduced cortical thickness in ACLF patients. Presence of reduced cortical thickness, altered metabolites and abnormal NP test scores in post recovery subjects as compared to those of controls is associated with incomplete clinical recovery. The current imaging protocol can be easily implemented in clinical settings to evaluate and monitor brain tissue changes in patients with ACLF during the course of treatment

Nottingham Trent Institutional Repository (IRep)